Skip to content

Conversation

cor3ntin
Copy link
Contributor

UTF-16 to UTF-16 conversions seems widespread,
and lone surrogate have a distinct representation in UTF-32.

Lets not warn on this case to make the warning easier to adopt. This follows SG-16 guideline

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1

Fixes #163719

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2025

@llvm/pr-subscribers-clang

Author: Corentin Jabot (cor3ntin)

Changes

UTF-16 to UTF-16 conversions seems widespread,
and lone surrogate have a distinct representation in UTF-32.

Lets not warn on this case to make the warning easier to adopt. This follows SG-16 guideline

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1

Fixes #163719


Full diff: https://github.com/llvm/llvm-project/pull/163927.diff

2 Files Affected:

  • (modified) clang/lib/Sema/SemaChecking.cpp (+8-1)
  • (modified) clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp (+4-4)
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 4f409ca0f414d..0300d09be0420 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -12309,13 +12309,20 @@ static void DiagnoseMixedUnicodeImplicitConversion(Sema &S, const Type *Source,
                                                    SourceLocation CC) {
   assert(Source->isUnicodeCharacterType() && Target->isUnicodeCharacterType() &&
          Source != Target);
+
+  // Lone surrogates have a distinct representation in UTF-32.
+  // Converting betweem UTF-16 and UTF-32 codepoint seems very widespread,
+  // so don't warn on such conversion.
+  if(Source->isChar16Type() && Target->isChar32Type())
+      return;
+
   Expr::EvalResult Result;
   if (E->EvaluateAsInt(Result, S.getASTContext(), Expr::SE_AllowSideEffects,
                        S.isConstantEvaluatedContext())) {
     llvm::APSInt Value(32);
     Value = Result.Val.getInt();
     bool IsASCII = Value <= 0x7F;
-    bool IsBMP = Value <= 0xD7FF || (Value >= 0xE000 && Value <= 0xFFFF);
+    bool IsBMP = Value <= 0xDFFF || (Value >= 0xE000 && Value <= 0xFFFF);
     bool ConversionPreservesSemantics =
         IsASCII || (!Source->isChar8Type() && !Target->isChar8Type() && IsBMP);
 
diff --git a/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp b/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
index fcff006d0e028..f17f20ca25295 100644
--- a/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
+++ b/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
@@ -14,7 +14,7 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c16(u32); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' may lose precision and change the meaning of the represented code unit}}
 
     c32(u8);  // expected-warning {{implicit conversion from 'char8_t' to 'char32_t' may change the meaning of the represented code unit}}
-    c32(u16); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' may change the meaning of the represented code unit}}
+    c32(u16);
     c32(u32);
 
 
@@ -30,7 +30,7 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c16(char32_t(0x7f));
     c16(char32_t(0x80));
     c16(char32_t(0xD7FF));
-    c16(char32_t(0xD800)); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' changes the meaning of the code unit '<0xD800>'}}
+    c16(char32_t(0xD800));
     c16(char32_t(0xE000));
     c16(char32_t(U'🐉')); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' changes the meaning of the code point '🐉'}}
 
@@ -44,8 +44,8 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c32(char16_t(0x80));
 
     c32(char16_t(0xD7FF));
-    c32(char16_t(0xD800)); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' changes the meaning of the code unit '<0xD800>'}}
-    c32(char16_t(0xDFFF)); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' changes the meaning of the code unit '<0xDFFF>'}}
+    c32(char16_t(0xD800));
+    c32(char16_t(0xDFFF));
     c32(char16_t(0xE000));
     c32(char16_t(u'☕'));
 

Copy link

github-actions bot commented Oct 17, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

UTF-16 to UTF-16 conversions seems widespread,
and lone surrogate have a distinct representation in UTF-32.

Lets not warn on this case to make the warning easier to adopt.
This follows SG-16 guideline

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1

Fixes llvm#163719
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Source != Target);

// Lone surrogates have a distinct representation in UTF-32.
// Converting betweem UTF-16 and UTF-32 codepoint seems very widespread,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Converting betweem UTF-16 and UTF-32 codepoint seems very widespread,
// Converting between UTF-16 and UTF-32 codepoints seems very widespread,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to allow this as an opt-in warning so folks who want to catch those conversions still can?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider that but given feedback we probably want to backport this change so I kept it simple

@cor3ntin
Copy link
Contributor Author

@AaronBallman

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though this should have a release note

@cor3ntin
Copy link
Contributor Author

@AaronBallman nope, i want to backport it :)

@AaronBallman
Copy link
Collaborator

@AaronBallman nope, i want to backport it :)

Sure! Make sure the backport has the release note then. :-D

@cor3ntin cor3ntin merged commit 3a15687 into llvm:main Oct 19, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 19, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-4 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/29305

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# .---command stderr------------
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# |  #0 0x000000010155c918 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x100f28918)
# |  #1 0x000000010155a6c8 llvm::sys::RunSignalHandlers() (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x100f266c8)
# |  #2 0x000000010155d418 SignalHandler(int, __siginfo*, void*) (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x100f29418)
# |  #3 0x000000019bd2f584 (/usr/lib/system/libsystem_platform.dylib+0x18047b584)
# |  #4 0x00000101010b10ac
# |  #5 0x00000001010bc4fc llvm::orc::ExecutionSession::removeJITDylibs(std::__1::vector<llvm::IntrusiveRefCntPtr<llvm::orc::JITDylib>, std::__1::allocator<llvm::IntrusiveRefCntPtr<llvm::orc::JITDylib>>>) (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x100a884fc)
# |  #6 0x00000001010bc2ac llvm::orc::ExecutionSession::endSession() (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x100a882ac)
# |  #7 0x0000000101148288 llvm::orc::LLJIT::~LLJIT() (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x100b14288)
# |  #8 0x000000010114cc14 llvm::orc::LLLazyJIT::~LLLazyJIT() (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x100b18c14)
# |  #9 0x000000010063d0a0 runOrcJIT(char const*) (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x1000090a0)
# | #10 0x0000000100638594 main (/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/lli+0x100004594)
# | #11 0x000000019b973154
# `-----------------------------
# error: command failed with exit status: -11
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# .---command stderr------------
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# `-----------------------------
# error: command failed with exit status: 2

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

-Wcharacter-conversion shouldn't warn about char16_t -> char32_t conversions

4 participants